This workshop was made using R 4.2.1, we suggest you update your R.
Prior to starting this workshop, you should run the following lines of code to install and update the packages needed for these demos.
## Make a list of packages
list_of_packages <- c("dplyr",
"tidyr",
"plyr",
"spocc",
"ridigbio",
"tibble",
"tidyverse",
"rbison",
"CoordinateCleaner",
"lubridate",
"ggplot2",
"gtools",
"raster",
"sp",
"spatstat",
"spThin",
"fields",
"ggspatial",
"rgdal",
"rangeBuilder",
"sf",
"dismo",
"devtools",
"ENMeval",
"caret",
"usdm",
"stringr",
"factoextra",
"FactoMineR",
"multcompView",
"ggsci",
"gridExtra",
"ecospat",
"rJava",
"viridis",
"ENMTools",
"ape",
"RStoolbox",
"hypervolume",
"phytools",
"picante",
"leaflet")
## Here we identify which packages are not installed and install these for you
new.packages <- list_of_packages[!(list_of_packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)
## Check version of packages
### Compare to package version used for demo creation
### This data frame has two columns: (1) package (2) version, indicating which
### version of the package was used to develop this workshop
versiondf <- read.csv("data/setup/setup.csv", stringsAsFactors = FALSE)
### Save your version under a new column named "current_version"
versiondf$current_version <- as.character(do.call(c, lapply(list_of_packages, packageVersion)))
### Compare the version the workshop was developed with to your current version
updatelist <- versiondf[which(versiondf$version != versiondf$current_version), ]
### Update packages with old versions
lapply(as.character(updatelist$packages), install.packages, ask = FALSE)
## Make sure all packages load
## Packages that are FALSE did not load
### If anything prints here, then something went wrong and a package did not install
loaded <- lapply(list_of_packages, require, character.only = TRUE)
list_of_packages[loaded == FALSE]
# Install packages not in CRAN
library(devtools)
install_github('johnbaums/rmaxent')
install_github("marlonecobos/kuenm")
## Check and make sure all github packages load
github_packages <- c("rmaxent", "kuenm")
github_loaded <- lapply(github_packages, require, character.only = TRUE)
github_packages[github_loaded == FALSE]
tidyverse is a collection of R packages that are used for “everyday data analysis” including ggplot2, dplyr, tidyr, readr, purr, tibble, stringr, forcats. We use dplyr and ggplot2 throughout many of the R scripts.
ridigbio is an R package that queries the iDigBio API.
spocc queries data from GBIF, USGS’ Biodiversity Information Serving Our Nation (BISON), iNaturalist, Berkeley Ecoinformatics Engine, eBird, iDigBio, VertNet, Ocean Biogeographic Information System (OBIS), and Atlas of Living Australia (ALA).
RCurl and rjson are used to query an application programming interface (API), or an online database.
Geospatial packages include sp, raster, maptools, maps, rgdal, RStoolbox, and mapproj.
ggplot2 is used to plot and visualize data. ggbiplot is an add on to ggplot2.
ENMTools, ENMeval, and dismo are specific to ecological niche modeling.
hypervolume and caret have functions related to statistics.
phytools and picante are related to phylogenetics.
factoextra and FactoMiner are used for statistics related to a principal components analysis.
system("xcode-select --install")
spThin and fields do not compile from source!
If you are still having issues installing ggbiplot, you may have to uninstall the package ‘backports’ and reset R.
This exercise requires the package ENMTools. This package requires Java (and the Oracle Java JDK) and that the maxent.jar file is in the folder that contains the dismo package (this package is a dependency of ENMTools). To find out where to put the jar file, run the command:
system.file("java", package="dismo")
Once you have your java and maxent file set, you are ready to go! If you have more issues, find some help here
To download the jar file to the right directory —
devtools::install_github("johnbaums/rmaxent")
rmaxent::get_maxent(version = "latest", quiet = FALSE)
The previous step may mask rgdal! Make sure to reload it!
For rJava, the newest package requires R >= 3.6.0
dismo only requires rJava >= 0.9-7, download 0.9.7
here
If you are experiencing errors with rJava:
## Remove rJava
remove.packages(rJava)
## Update your R Java configuration
system("sudo R CMD javareconf")
## Restart R and check to see if the path matches
Sys.getenv("DYLD_FALLBACK_LIBRARY_PATH")
## Reinstall rJava
install.packages("rJava")
library(rJava)
If you are still having issues with rJava - check to see if you are using Java 12 (this should be included in the error message):
system("java -version")
The initial release of java 12 does not work - install an old version instead. To install an old version, navigate to the Oracle website or use homebrew.
system("brew cask install homebrew/cask-versions/java11")
Restart R and redo the previous steps. Required: uninstall the newer versions of java prior to repeating and restart R after.
## Uninstall Java 12, then repeat the following
## Remove rJava
remove.packages(rJava)
## Update your R Java configuration
system("sudo R CMD javareconf")
## Restart R and check to see if the path matches
Sys.getenv("DYLD_FALLBACK_LIBRARY_PATH")
## Reinstall rJava
install.packages("rJava")
library(rJava)
More debugging: Check out jerrybreak’s comment from 12/30/2021.
If you have additional issues, please let us know!
ML Gaynor.
library(tidyverse)
library(spocc)
library(ridigbio)
library(rbison)
library(leaflet)
This is a function I created with Natalie Patten. It will be part of her R package gatoRs (Geographic And Taxonomic Occurrence R-based Scrubbing).
source("functions/gators.R")
iDigBio_GU <- idig_search_records(rq=list(scientificname="Galax urceolata"))
iDigBio_GU_family <- idig_search_records(rq=list(family="Diapensiaceae"), limit=1000)
What if you want to read in all the points for a family within an extent?
Hint: Use the iDigBio portal to determine the bounding box for your region of interest.
The bounding box delimits the geographic extent.
rq_input <- list("scientificname"=list("type"="exists"),
"family"="Diapensiaceae",
geopoint=list(
type="geo_bounding_box",
top_left=list(lon = -98.16, lat = 48.92),
bottom_right=list(lon = -64.02, lat = 23.06)
)
)
Search using the input you just made
iDigBio_GU_family_USA <- idig_search_records(rq_input, limit=1000)
write.csv(iDigBio_GU, "data/download/iDigBio_GU_20220712.csv",
row.names = FALSE)
write.csv(iDigBio_GU_family, "data/download/iDigBio_GU_family_20220712.csv",
row.names = FALSE)
Shortia_galacifolia <- c("Shortia galacifolia", "Sherwoodia galacifolia")
Galax_urceolata <- c("Galax urceolata", "Galax aphylla")
Pyxidanthera_barbulata <- c("Pyxidanthera barbulata","Pyxidanthera barbulata var. barbulata")
Pyxidanthera_brevifolia <- c("Pyxidanthera brevifolia", "Pyxidanthera barbulata var. brevifolia")
This function downloads records for all names listed from iDigBio, GBIF, and BISON. It keeps specific columns from each database.
spocc_combine(Shortia_galacifolia,
"data/download/raw/Shortia_galacifolia_raw_20220712.csv")
spocc_combine(Galax_urceolata,
"data/download/raw/Galax_urceolata_raw_20220712.csv")
spocc_combine(Pyxidanthera_barbulata,
"data/download/raw/Pyxidanthera_barbulata_raw_20220712.csv")
spocc_combine(Pyxidanthera_brevifolia,
"data/download/raw/Pyxidanthera_brevifolia_raw_20220712.csv")
rawdf <- read.csv("data/download/raw/Shortia_galacifolia_raw_20220712.csv")
What columns are included?
names(rawdf)
## [1] "name" "basis"
## [3] "date" "institutionID"
## [5] "collectionCode" "collectionID"
## [7] "country" "county"
## [9] "state" "locality"
## [11] "Latitude" "Longitude"
## [13] "ID" "coordinateUncertaintyInMeters"
## [15] "informationWithheld" "habitat"
## [17] "prov" "spocc.latitude"
## [19] "spocc.longitude" "spocc.prov"
## [21] "spocc.date" "spocc.name"
Here is a table of the columns returned for each species in spocc_combine:
How many observations do we start with?
nrow(rawdf)
## [1] 815
The error message here indicates many points do not have long/lat values (more in 02).
leaflet(rawdf) %>%
addMarkers(label = paste0(rawdf$Longitude, ", ", rawdf$Latitude)) %>%
addTiles()
ML Gaynor.
library(dplyr)
library(tidyr)
library(raster)
library(sp)
library(sf)
library(spatstat)
library(spThin)
library(fields)
library(lubridate)
library(CoordinateCleaner)
library(ggplot2)
library(ggspatial)
library(leaflet)
This is a function I created with Natalie Patten. It will be part of her R package gatoRs (Geographic And Taxonomic Occurrence R-based Scrubbing).
source("functions/gators.R")
rawdf <- read.csv("data/download/raw/Shortia_galacifolia_raw_20220712.csv")
How many observations do we start with?
nrow(rawdf)
## [1] 815
Inspect scientific names included in the raw df.
unique(rawdf$name)
## [1] "shortia galacifolia"
## [2] "sherwoodia galacifolia"
## [3] "Shortia galacifolia Torr. & A.Gray"
## [4] "Sherwoodia galacifolia (Torr. & A.Gray) House"
Create a list of accepted names based on the name column in your data frame
search <- c("Shortia galacifolia", "Sherwoodia galacifolia")
Filter to only include accepted name:
df <- filter_fix_names(rawdf, listofsynonyms = search, acceptedname = "Shortia galacifolia")
## [1] "List of synonyms:"
## [1] "Shortia galacifolia" "Sherwoodia galacifolia"
## [1] "Shortia galacifolia|Sherwoodia galacifolia"
## [1] "Looking at unique names in raw df"
## [1] "shortia galacifolia"
## [2] "sherwoodia galacifolia"
## [3] "Shortia galacifolia Torr. & A.Gray"
## [4] "Sherwoodia galacifolia (Torr. & A.Gray) House"
## [1] "Looking at unique names in cleaned df"
## [1] "shortia galacifolia"
## [2] "sherwoodia galacifolia"
## [3] "Shortia galacifolia Torr. & A.Gray"
## [4] "Sherwoodia galacifolia (Torr. & A.Gray) House"
How many observations do we have now?
nrow(df)
## [1] 815
Merge the two locality columns
df$Latitude <- dplyr::coalesce(df$Latitude, df$spocc.latitude)
df$Longitude <- dplyr::coalesce(df$Longitude, df$spocc.longitude)
If spocc didn’t download any lat/longs and there are 0 values
in the spocc.latitude or spocc.longitude columns, skip this
step
The two columns could have different classes, if so, find the fix
here
Merge the two date columns
df$date <- dplyr::coalesce(df$date, df$spocc.date)
Subset the columns
df <- df %>%
dplyr::select(ID = ID,
name = new_name,
basis = basis,
coordinateUncertaintyInMeters = coordinateUncertaintyInMeters,
informationWithheld = informationWithheld,
lat = Latitude,
long = Longitude,
date = date)
Filtering out NA’s
df <- df %>%
filter(!is.na(long)) %>%
filter(!is.na(lat))
How many observations do we have now?
nrow(df)
## [1] 259
Round to two decimal places
df$lat <- round(df$lat, digits = 2)
df$long <- round(df$long, digits = 2)
Remove points at 0.00, 0.00
df <- df %>%
filter(long != 0.00) %>%
filter(lat != 0.00)
Remove coordinates in cultivated zones, botanical gardens, and outside our desired range
df <- CoordinateCleaner::cc_inst(df,
lon = "long",
lat = "lat",
species = "name")
## Testing biodiversity institutions
## Removed 0 records.
Next, we look for geographic outliers and remove outliers.
df <- CoordinateCleaner::cc_outl(df,
lon = "long",
lat = "lat",
species = "name")
## Testing geographic outliers
## Removed 44 records.
How many observations do we have now?
nrow(df)
## [1] 215
Parse dates into the same format
df$date <- lubridate::ymd(df$date)
Separate date into year, month, day
df <- df %>%
mutate(year = lubridate::year(date),
month = lubridate::month(date),
day = lubridate::day(date))
df <- distinct(df, lat, long, year, month, day, .keep_all = TRUE)
How many observations do we have now?
nrow(df)
## [1] 211
Maxent will only retain one point per pixel. To make the ecological niche analysis comparable, we will retain only one point per pixel.
bio1 <- raster("data/climate_processing/bioclim/bio_1.tif")
rasterResolution <- max(res(bio1))
Remove a point which nearest neighbor distance is smaller than the resolution size (aka remove one point in a pair that occurs within one pixel)
while(min(nndist(df[,6:7])) < rasterResolution){
nnD <- nndist(df[,6:7])
df <- df[-(which(min(nnD) == nnD) [1]), ]
}
How many observations do we have now?
nrow(df)
## [1] 205
Reduce the effects of sampling bias using randomization approach.
nnDm <- rdist.earth(as.matrix(data.frame(lon = df$long, lat = df$lat)), miles = FALSE, R = NULL)
nnDmin <- do.call(rbind, lapply(1:5, function(i) sort(nnDm[,i])[2]))
min(nnDmin)
## [1] 0.9109077
keep <- spThin::thin(loc.data = df,
verbose = FALSE,
long.col = "long",
lat.col = "lat",
spec.col = "name",
thin.par = 0.002, # Studies found 2m distance was enough to collect unique genets
reps = 1,
locs.thinned.list.return = TRUE,
write.files = FALSE)[[1]]
df <- df %>%
filter((lat %in% keep$Latitude +
long %in% keep$Longitude) == 2)
nrow(df)
## [1] 205
df_fixed <- st_as_sf(df, coords = c("long", "lat"), crs = 4326)
USA <- borders(database = "usa", colour = "gray50", fill = "gray50")
state <- borders(database = "state", colour = "black", fill = NA)
Here we are using ggplot2 and ggspatial to plot our occurrence records. We are using two basemaps, USA and state. The geom_point function adds our points based on long and lat, and colors them blue. We set the coordinate visualization space with coord_sf. We fixed the x and y labels. Finally, we added a scale and north arrow.
simple_map <- ggplot() +
USA +
state +
geom_sf(df_fixed,
mapping = aes(col = name),
col = "blue") +
coord_sf(xlim = c(min(df$long) - 3, max(df$long) + 3),
ylim = c(min(df$lat) - 3, max(df$lat) + 3)) +
xlab("Longitude") +
ylab("Latitude") +
annotation_scale() +
annotation_north_arrow(height = unit(1, "cm"),
width = unit(1, "cm"),
location = "tl")
simple_map
Another fun way to view these points is with leaflet.
leaflet(df_fixed) %>%
addMarkers(label = paste0(df$long, ", ", df$lat)) %>%
addTiles()
write.csv(df, "data/cleaning_demo/Shortia_galacifolia_20220712-cleaned.csv", row.names = FALSE)
alldf <- list.files("data/cleaning_demo/", full.names = TRUE,
recursive = FALSE, include.dirs = FALSE, pattern = "*.csv")
alldf <- lapply(alldf, read.csv)
alldf <- do.call(rbind, alldf)
Visually inspect records to verify no additional records should be removed.
## Plot all records
### Make points spatial
alldf_fixed <- st_as_sf(alldf, coords = c("long", "lat"), crs = 4326)
### Set basemap
USA <- borders(database = "usa", colour = "gray50", fill = "gray50")
state <- borders(database = "state", colour = "black", fill = NA)
### Plot
all_map <- ggplot() +
USA +
state +
geom_sf(alldf_fixed,
mapping = aes(col = factor(name))) +
coord_sf(xlim = c(min(alldf$long) - 3, max(alldf$long) + 3),
ylim = c(min(alldf$lat) - 3, max(alldf$lat) + 3)) +
xlab("Longitude") +
ylab("Latitude") +
annotation_scale() +
annotation_north_arrow(height = unit(1, "cm"),
width = unit(1, "cm"),
location = "tl")
all_map
alldf <- alldf %>%
dplyr::select(name, lat, long)
write.csv(alldf, "data/cleaning_demo/maxent_ready/diapensiaceae_maxentready_20220712.csv", row.names = FALSE)
Create a batch file for GeoLocate. Created by ME Mabry.
library(dplyr)
library(tidyr)
library(plyr)
library(tibble)
rawdf <- read.csv("data/download/raw/Shortia_galacifolia_raw_20220712.csv")
rawdf$Latitude <- dplyr::coalesce(rawdf$Latitude, rawdf$spocc.latitude)
rawdf$Longitude <- dplyr::coalesce(rawdf$Longitude, rawdf$spocc.longitude)
rawdf <- rawdf %>%
dplyr::select(ID = ID,
name = name,
basis = basis,
coordinateUncertaintyInMeters = coordinateUncertaintyInMeters,
informationWithheld = informationWithheld,
country = country,
locality = locality,
lat = Latitude,
long = Longitude,
date = date,
state = state,
county = county)
rawdf_GeoRef <- filter(rawdf, is.na(lat))
rawdf_GeoRef <- rawdf_GeoRef %>%
dplyr::select("locality string" = locality,
country = country,
state = state,
county = county,
latitude = lat,
longitude = long,
ID = ID,
name = name,
basis = basis)
rawdf_GeoRef$'correction status' <- ""
rawdf_GeoRef$precision <- ""
rawdf_GeoRef$'error polygon' <- ""
rawdf_GeoRef$'multiple results' <- ""
rawdf_GeoRef2<- rawdf_GeoRef[,c("locality string", "country", "state", "county", "latitude",
"longitude", "correction status", "precision", "error polygon",
"multiple results", "ID", "name", "basis")]
write.csv(rawdf_GeoRef2,
file = "data/georeferencing/Shortia_galacifolia_Needing_GeoRef_20220712.csv",
row.names = FALSE)
Climate layer processing.
Modified and created by ML Gaynor.
Based on code by Mike Belitz (mbelitz/Odo_SDM_Rproj).
library(raster)
library(gtools)
library(dplyr)
library(rgdal)
library(sp)
library(rangeBuilder)
library(sf)
library(caret)
library(usdm)
library(dismo)
library(stringr)
Based on code by Mike Belitz (mbelitz/Odo_SDM_Rproj).
source("functions/VIFLayerSelect.R")
biolist <- list.files("data/climate_processing/bioclim/", pattern = "*.tif", full.names = TRUE)
Order list using gtools.
biolist <- mixedsort(sort(biolist))
Load rasters as a stack.
biostack <- raster::stack(biolist)
And fix the name column class and make sure it is a character.
alldf <- read.csv("data/cleaning_demo/maxent_ready/diapensiaceae_maxentready_20220712.csv")
alldf$name <- as.character(alldf$name)
Here we will make the projection layers, or the layers which include shared space. First, we have to define the accessible space.
alldfsp <- alldf
pts <- cbind(alldf$long, alldf$lat)
alldfsp <- SpatialPointsDataFrame(pts, data = alldf, proj4string=CRS("EPSG:4326"))
hull <- getDynamicAlphaHull(x = alldfsp@coords,
fraction = 1, # min. fraction of records we want included
partCount = 1, # number of polygons
initialAlpha = 20, # initial alpha size, 20m
clipToCoast = "terrestrial",
proj = "+proj=longlat +datum=WGS84")
## Warning in proj4string(hull): CRS object has comment, which is lost in output; in tests, see
## https://cran.r-project.org/web/packages/sp/vignettes/CRS_warnings.html
## Warning in sp::spTransform(gshhs, CRS(proj4string(hull))): NULL source CRS
## comment, falling back to PROJ string
## Warning in wkt(obj): CRS object has no comment
plot(hull[[1]], col=transparentColor('gray50', 0.5), border = NA)
points(x = alldf$long, y = alldf$lat, cex = 0.5, pch = 3)
Here we take the 80th quantile of the max distance between points
buffDist <- quantile(x = (apply(spDists(alldfspTrans), 2,
FUN = function(x) sort(x)[2])),
probs = 0.80, na.rm = TRUE)
buffDist
## 80%
## 5862.504
buffer_m <- buffer(x = hullTrans, width = buffDist, dissolve = TRUE)
buffer <- spTransform(buffer_m, CRS("EPSG:4326"))
plot(buffer, col=transparentColor('gray50', 0.5), border = NA)
points(x = alldf$long, y = alldf$lat, cex = 0.5, pch = 3)
path <- "data/climate_processing/all/"
end <- ".asc"
for(i in 1:length(biolist)){
# Subset raster layer
rast <- biostack[[i]]
# Setup file names
name <- names(rast)
out <- paste0(path, name)
outfile <- paste0(out, end)
# Crop and mask
c <- crop(rast, extent(buffer))
c <- mask(c, buffer)
# Write raster
writeRaster(c, outfile, format = "ascii", overwrite = TRUE)
}
Warning: Many of these steps are time intensive! Do not run during the workshop, we have already done these for you.
We only want to include layers that are not highly correlated. To assess which layers we will include, we can look at the pearson correlation coefficient among layers.
clippedlist <- list.files("data/climate_processing/all/", pattern = "*.asc", full.names = TRUE)
clippedlist <- mixedsort(sort(clippedlist))
clippedstack <- raster::stack(clippedlist)
Warning: Time Intensive!
corr <- layerStats(clippedstack, 'pearson', na.rm=TRUE)
Isolate only the pearson correlation coefficient and take absolute value.
c <- abs(corr$`pearson correlation coefficient`)
write.csv(c, "data/climate_processing/correlationBioclim.csv", row.names = FALSE)
Highly correlated layers (> |0.80|) can impact the statistical significance of the niche models and therefore must be removed.
envtCor <- mixedsort(sort(findCorrelation(c, cutoff = 0.80, names = TRUE, exact = TRUE)))
envtCor
## [1] "bio_1" "bio_3" "bio_4" "bio_6" "bio_10" "bio_11" "bio_12" "bio_13"
## [9] "bio_16" "bio_17" "bio_19"
Warning: Time Intensive!
VIF can detect for multicollinearity in a set of multiple regression
variables. Run a simple maxent model for every species and calculate the
average permutation contribution.
Loop through each species and save permutation importance in
list.
Warning: This will print warnings even when it works
fine.
set.seed(195)
m <- c()
for(i in 1:length(unique(alldf$name))){
species <- unique(alldf$name)[i]
spp_df <- alldf %>%
dplyr::filter(name == species)
coordinates(spp_df) <- ~ long + lat
model <- maxent(x = clippedstack, p = coordinates(spp_df),
progress = "text", silent = FALSE)
m[[i]] <- vimportance(model)
}
mc <- do.call(rbind, m)
mc_average <- aggregate(mc[, 2], list(mc$Variables), mean)
mc_average <- mc_average %>%
dplyr::select(Variables = Group.1, permutation.importance = x)
mc1 <- mc_average
Use VIF and the MaxEnt permutation importance to select the best variables for your model. Note, this leads to different layers when the models are rerun without setting seed due to permutations being random.
selectedlayers <- VIF_layerselect(clippedstack, mc_average)
mixedsort(sort(names(selectedlayers)))
Since this can vary per system (despite setting seed), we added this line to keep our files consistent for the workshop
sl <- c("bio_3", "bio_7", "bio_8", "bio_9", "bio_14", "bio_15", "bio_18", "elev")
selectedlayers <- raster::subset(clippedstack, sl)
Move selected layers to “Present_Layers/all/” for use in MaxEnt later.
for(i in 1:length(names(selectedlayers))){
name <- names(selectedlayers)[i]
from <- paste0("data/climate_processing/all/", name, ".asc")
to <- paste0("data/climate_processing/PresentLayers/all/", name, ".asc")
file.copy(from, to,
overwrite = TRUE, recursive = FALSE,
copy.mode = TRUE)
}
for(i in 1:length(unique(alldf$name))){
species <- unique(alldf$name)[i]
# Subset species from data frame
spp_df <- alldf %>%
dplyr::filter(name == species)
# Make spatial
pts <- cbind(spp_df$long, spp_df$lat)
spp_df <- SpatialPointsDataFrame(pts, data = spp_df, proj4string=CRS("EPSG:4326"))
## Create alpha hull
sphull <- rangeBuilder::getDynamicAlphaHull(x = spp_df@coords,
fraction = 1, # min. fraction of records we want included
partCount = 1, # number of polygons
initialAlpha = 20, # initial alpha size, 20m
clipToCoast = "terrestrial",
proj = "+proj=longlat +datum=WGS84")
### Transform into CRS related to meters
sphullTrans <- spTransform(sphull[[1]], "+proj=cea +lat_ts=0 +lon_0=0")
spp_dfTrans <- spTransform(spp_df, "+proj=cea +lat_ts=0 +lon_0")
### Calculate buffer size
#### Here we take the 80th quantile of the max distance between points
spbuffDist <- quantile(x = (apply(spDists(spp_dfTrans), 2, FUN = function(x) sort(x)[2])),
probs = 0.80, na.rm = TRUE)
### Buffer the hull
spbuffer_m <- buffer(x = sphullTrans, width = spbuffDist, dissolve = TRUE)
spbuffer <- spTransform(spbuffer_m, CRS("EPSG:4326"))
### Crop and Mask
spec <- gsub(" ", "_", species)
path <- paste0("data/climate_processing/PresentLayers/", spec,"/")
end <- ".asc"
for(j in 1:length(names(selectedlayers))){
# Subset raster layer
rast <- selectedlayers[[j]]
# Setup file names
name <- names(rast)
out <- paste0(path, name)
outfile <- paste0(out, end)
# Crop and mask
c <- crop(rast, extent(spbuffer))
c <- mask(c, spbuffer)
# Write raster
writeRaster(c, outfile, format = "ascii", overwrite = TRUE)
}
}
Ecological analysis using points. This is based off scripts created
by Hannah Owens and James Watling, and Anthony Melton.
Modified and created by ML Gaynor.
library(raster)
library(dplyr)
library(tidyr)
library(gtools)
library(factoextra)
library(FactoMineR)
library(multcompView)
library(ggsci)
library(gridExtra)
library(ggplot2)
library(ecospat)
library(dismo)
library(ade4)
alldf <- read.csv("data/cleaning_demo/maxent_ready/diapensiaceae_maxentready_20220712.csv")
list <- list.files("data/climate_processing/PresentLayers/all", full.names = TRUE, recursive = FALSE)
list <- mixedsort(sort(list))
envtStack <- stack(list)
For each occurence record, extract the value for each bioclim variables using the package raster.
ptExtracted <- raster::extract(envtStack, alldf[3:2])
ptExtracteddf <- as.data.frame(ptExtracted)
ptExtracteddf <- ptExtracteddf %>%
dplyr::mutate(name = as.character(alldf$name), x = alldf$long, y = alldf$lat)
ptExtracteddf <- ptExtracteddf %>%
tidyr::drop_na(bio_3, bio_7, bio_8, bio_9, bio_14, bio_15, bio_18, elev)
data.bioclim <- ptExtracteddf[, 1:8]
data.species <- ptExtracteddf[, 9]
data.pca <- prcomp(data.bioclim, scale. = TRUE)
When you use the command prcomp your loading variables show up as rotational variables. Thanks to a really great answer on stack overflow you can even convert the rotational variable to show the relative contribution.
loadings <- data.pca$rotation
summary(loadings)
## PC1 PC2 PC3 PC4
## Min. :-0.4412 Min. :-0.2724 Min. :-0.64672 Min. :-0.424176
## 1st Qu.:-0.3611 1st Qu.:-0.2104 1st Qu.:-0.39439 1st Qu.:-0.261111
## Median :-0.3191 Median : 0.2210 Median : 0.05544 Median :-0.002404
## Mean :-0.1145 Mean : 0.1396 Mean :-0.06823 Mean : 0.026802
## 3rd Qu.: 0.1794 3rd Qu.: 0.3399 3rd Qu.: 0.22268 3rd Qu.: 0.195788
## Max. : 0.4026 Max. : 0.6699 Max. : 0.35440 Max. : 0.693426
## PC5 PC6 PC7 PC8
## Min. :-0.3757 Min. :-0.76021 Min. :-0.6064 Min. :-0.46111
## 1st Qu.:-0.2111 1st Qu.:-0.38829 1st Qu.:-0.4301 1st Qu.:-0.01278
## Median : 0.3307 Median :-0.02424 Median :-0.1909 Median : 0.14040
## Mean : 0.1403 Mean :-0.14128 Mean :-0.1633 Mean : 0.14398
## 3rd Qu.: 0.4042 3rd Qu.: 0.05884 3rd Qu.: 0.1460 3rd Qu.: 0.27908
## Max. : 0.4495 Max. : 0.32091 Max. : 0.2800 Max. : 0.62820
There are two options to convert the loading to show the relative contribution, they both give the same answer so either can be used.
loadings_relative_A <- t(t(abs(loadings))/rowSums(t(abs(loadings))))*100
loadings_relative_A
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## bio_3 12.351031 7.817062 15.694846 30.253573 13.782057 2.5529480 11.192623
## bio_7 14.392069 10.152713 18.714155 1.724893 15.488151 2.3322268 24.245310
## bio_8 14.793350 12.217596 8.970650 1.934623 10.321323 36.9229504 5.325418
## bio_9 11.101897 9.380471 26.974050 14.538194 13.689124 19.2400201 5.883375
## bio_14 16.211651 7.530844 1.611266 18.506444 14.478171 0.1983369 7.364618
## bio_15 3.991946 26.060031 3.013367 5.734652 7.455848 15.5864265 9.377266
## bio_18 12.958284 16.245219 10.239994 10.343363 16.378703 4.4353670 16.080716
## elev 14.199772 10.596062 14.781672 16.964258 8.406624 18.7317244 20.530674
## PC8
## bio_3 0.1743179
## bio_7 7.2956146
## bio_8 5.6968636
## bio_9 1.8425332
## bio_14 29.0667559
## bio_15 26.0560908
## bio_18 21.3357900
## elev 8.5320340
loadings_relative_B <- sweep(x = abs(loadings),
MARGIN = 2,
STATS = colSums(abs(loadings)), FUN = "/")*100
loadings_relative_B
First, I made a theme to change the background of the plot. Next, I changed the plot margins and the text size.
theme <- theme(panel.background = element_blank(),
panel.border=element_rect(fill=NA),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
strip.background=element_blank(),
axis.ticks=element_line(colour="black"),
plot.margin=unit(c(1,1,1,1),"line"),
axis.text = element_text(size = 12),
legend.text = element_text(size = 12),
legend.title = element_text(size = 12),
text = element_text(size = 12))
pal <- pal_locuszoom()(4)
Next, use ggbiplot where obs.scale indicates the scale factor to apply to observation, var.scale indicates the scale factor to apply to variables, ellipse as TRUE draws a normal data ellipse for each group, and circle as TRUE draws a correlation circle.
g <- ggbiplot(data.pca, obs.scale = 1, var.scale = 1,
groups = data.species, ellipse = TRUE, circle = TRUE) +
scale_color_manual(name = '', values = pal) +
theme(legend.direction = 'vertical', legend.position = 'right',
legend.text = element_text(size = 12, face = "italic")) +
theme
g
Simple function to run an ANOVA and a post-hoc Tukey-HSD test
stat.test <- function(dataframe, x = "name", y){
bioaov <- aov(as.formula(paste0(y,"~",x)), data = dataframe)
TH <- TukeyHSD(bioaov, "name")
m <- multcompLetters(TH$name[,4])
y.val <- as.numeric(max(dataframe[[y]]) + 1)
groups <- data.frame(groups = m$Letters, name = names(m$Letters), y.val = rep(y.val, 4))
return(groups)
}
bio3aovplot <- ggplot(ptExtracteddf, aes(x = name, y = bio_3)) +
geom_boxplot(aes(fill = name)) +
scale_color_manual(name = '', values = pal) +
geom_text(data = stat.test(dataframe =ptExtracteddf, y = "bio_3"),
mapping = aes(x = name,
y = max(ptExtracteddf["bio_3"]+1),
label = groups),
size = 5, inherit.aes = FALSE) +
theme(axis.text.x = element_text(angle = 90, size = 8, face = 'italic'))
bio3aovplot
variablelist <- colnames(ptExtracteddf)[1:8]
plotlist <- list()
maxstats <- c()
statsout <- c()
for(i in 1:8){
vname <- variablelist[i]
maxstats[i] <- as.numeric(max(ptExtracteddf[[vname]]) + 1)
statsout[[i]] <- stat.test(dataframe = ptExtracteddf, y = vname)
plotlist[[i]] <- ggplot(ptExtracteddf, aes(x = name, y = .data[[vname]])) +
geom_boxplot(aes(fill = name)) +
scale_colour_manual(name = 'Species', values = pal) +
geom_text(data = statsout[[i]],
mapping = aes(x = name,
y = y.val,
label = groups),
size = 5, inherit.aes = FALSE) +
scale_x_discrete(labels = c('G', 'Pba','Pbr', 'S')) +
ggtitle(label = vname) +
ylab(vname) +
theme(legend.position = "none") +
ylim(0, (maxstats[i]*1.2))
}
gridExtra::grid.arrange(grobs = plotlist)
## Warning: Removed 15 rows containing non-finite values (stat_boxplot).
## Warning: Removed 1 rows containing non-finite values (stat_boxplot).
bg1 <- randomPoints(mask = envtStack, n = 1000, p = alldf[,3:2])
bg1.env <- raster::extract(envtStack, bg1)
bg1.env <- data.frame(bg1.env)
allpt.bioclim <- rbind(bg1.env, data.bioclim)
pca.env <- dudi.pca(allpt.bioclim,
center = TRUE, # Center by the mean
scannf = FALSE, # Don't plot
nf = 2) # Number of axis to keep
p1.score <- suprow(pca.env, dplyr::filter(ptExtracteddf, name == "Galax urceolata")[, 1:8])$li
p2.score <- suprow(pca.env, dplyr::filter(ptExtracteddf, name == "Pyxidanthera barbulata")[, 1:8])$li
p3.score <- suprow(pca.env, dplyr::filter(ptExtracteddf, name == "Pyxidanthera brevifolia")[, 1:8])$li
p4.score <- suprow(pca.env, dplyr::filter(ptExtracteddf, name == "Shortia galacifolia")[, 1:8])$li
scores.clim <- pca.env$li
plot(scores.clim, pch = 16, asp = 1,
col = adjustcolor(1, alpha.f = 0.2), cex = 2,
xlab = "PC1", ylab = "PC2")
points(p1.score, pch = 18, col = pal[1], cex = 2)
points(p2.score, pch = 18, col = pal[2], cex = 2)
points(p3.score, pch = 18, col = pal[3], cex = 2)
points(p4.score, pch = 18, col = pal[4], cex = 2)
Create occurrence density grids based on the ordination data.
z1 <- ecospat.grid.clim.dyn(scores.clim, scores.clim, p1.score, R = 100)
## Registered S3 methods overwritten by 'adehabitatMA':
## method from
## print.SpatialPixelsDataFrame sp
## print.SpatialPixels sp
z2 <- ecospat.grid.clim.dyn(scores.clim, scores.clim, p2.score, R = 100)
z3 <- ecospat.grid.clim.dyn(scores.clim, scores.clim, p3.score, R = 100)
z4 <- ecospat.grid.clim.dyn(scores.clim, scores.clim, p4.score, R = 100)
zlist <- list(z1, z2, z3, z4)
Schoener’s D ranges from 0 to 1. 0 represents no similarity between niche space. 1 represents completely identical niche space.
overlapD <- matrix(ncol = 2, nrow = 7)
n <- 1
for(i in 1:3){
for(j in 2:4){
if(i != j){
overlapD[n, 1]<- paste0("z", i, "-", "z", j)
overlapD[n, 2]<- ecospat.niche.overlap(zlist[[i]], zlist[[j]], cor = TRUE)$D
n <- n + 1
}
}
}
overlapDdf <- data.frame(overlapD)
overlapDdf
## X1 X2
## 1 z1-z2 0.085115840903214
## 2 z1-z3 0.00641344441404568
## 3 z1-z4 0.476780105514052
## 4 z2-z3 0.0115479048153222
## 5 z2-z4 0.00566124942172497
## 6 z3-z2 0.0115479048153222
## 7 z3-z4 0
par(mfrow=c(1,2))
ecospat.plot.niche.dyn(z1, z4, quant=0.25, interest = 1
, title= "Niche Overlap - Z1 top", name.axis1="PC1", name.axis2="PC2")
ecospat.plot.niche.dyn(z1, z4, quant=0.25, interest = 2
, title= "Niche Overlap - Z4 top", name.axis1="PC1", name.axis2="PC2")
Based on Warren et al. 2008 - Are the two niche identical?
Hypothesis test for D, null based on randomization. H1: the niche
overlap is higher than expected by chance (or when randomized).
eq.test <- ecospat.niche.equivalency.test(z1, z4, rep = 10)
ecospat.plot.overlap.test(eq.test, "D", "Equivalency")
Based on Warren et al. 2008 - Are the two niche similar?
Can one species’ niche predict the occurrences of a second species
better than expected by chance?
sim.test <- ecospat.niche.similarity.test(z1, z4, rep = 10, rand.type=2)
ecospat.plot.overlap.test(sim.test, "D", "Similarity")
Ecological Niche Modeling in R.
Modified and created by Anthony Melton and ML Gaynor.
This script is for generating and testing ENMs using ENMEval. Please see the paper describing ENMEval and their vignette.
options(java.parameters = "- Xmx16g") # increase memory that can be used
library(raster)
library(gtools)
library(dplyr)
library(dismo)
library(ENMeval)
library(ggplot2)
library(viridis)
source("functions/ENMevaluation.R")
alldf <- read.csv("data/cleaning_demo/maxent_ready/diapensiaceae_maxentready_20220712.csv")
Galax_urceolata <- dplyr::filter(alldf, name == "Galax urceolata")
Pyxidanthera_barbulata <- dplyr::filter(alldf, name == "Pyxidanthera barbulata")
Pyxidanthera_brevifolia <- dplyr::filter(alldf, name == "Pyxidanthera brevifolia")
Shortia_galacifolia <- dplyr::filter(alldf, name == "Shortia galacifolia")
list <- list.files("data/climate_processing/PresentLayers/all", full.names = TRUE, recursive = FALSE)
list <- mixedsort(sort(list))
allstack <- stack(list)
gstack <- stack(mixedsort(sort(list.files("data/climate_processing/PresentLayers/Galax_urceolata/", full.names = TRUE))))
pbastack <- stack(mixedsort(sort(list.files("data/climate_processing/PresentLayers/Pyxidanthera_barbulata/", full.names = TRUE))))
pbrstack <- stack(mixedsort(sort(list.files("data/climate_processing/PresentLayers/Pyxidanthera_brevifolia/", full.names = TRUE))))
sstack <- stack(mixedsort(sort(list.files("data/climate_processing/PresentLayers/Shortia_galacifolia/", full.names = TRUE))))
projection(allstack) <- "+proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs"
projection(gstack) <- "+proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs"
projection(pbastack) <- "+proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs"
projection(pbrstack) <- "+proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs"
projection(sstack) <- "+proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs"
Warning: Time intensive on Apple M1 chip! Match MaxEnt GUI settings (see word document and powerpoint)
evaldis <- dismo::maxent(x = gstack, p = Galax_urceolata[, c("long", "lat")], nbg = 10000,
args = c("projectionlayers=data/climate_processing/PresentLayers/all",
"responsecurves", "jackknife",
"outputformat=logistic","randomseed",
"randomtestpoints=25", "replicates=5",
"replicatetype=subsample", "maximumiterations=5000",
"writebackgroundpredictions","responsecurvesexponent",
"writeplotdata"),
removeDuplicates = TRUE
#,path = "data/Ecological_Niche_Modeling/enm_output/Galax_urceolata/"
)
## This is MaxEnt version 3.4.3
ENMeval will generate multiple models and test them per the specified test method.Two important variables here are the regularization multiplier (RM) value and the feature class (FC). FC will allow for different shapes in response curves (linear, hinge, quadratic, product, and threshold) can be used in the model and RM will influence how many parameters are included in the model.
eval1 <- ENMeval::ENMevaluate(occ = Galax_urceolata[, c("long", "lat")],
env = gstack,
tune.args = list(fc = c("L","Q"), rm = 1:2),
partitions = "block",
n.bg = 10000,
parallel = FALSE,
algorithm = 'maxent.jar',
user.eval = proc)
##
|
| | 0%
|
|================== | 25%This is MaxEnt version 3.4.3
## This is MaxEnt version 3.4.3
## This is MaxEnt version 3.4.3
## This is MaxEnt version 3.4.3
## This is MaxEnt version 3.4.3
##
|
|=================================== | 50%This is MaxEnt version 3.4.3
## This is MaxEnt version 3.4.3
## This is MaxEnt version 3.4.3
## This is MaxEnt version 3.4.3
## This is MaxEnt version 3.4.3
##
|
|==================================================== | 75%This is MaxEnt version 3.4.3
## This is MaxEnt version 3.4.3
## This is MaxEnt version 3.4.3
## This is MaxEnt version 3.4.3
## This is MaxEnt version 3.4.3
##
|
|======================================================================| 100%This is MaxEnt version 3.4.3
## This is MaxEnt version 3.4.3
## This is MaxEnt version 3.4.3
## This is MaxEnt version 3.4.3
## This is MaxEnt version 3.4.3
Inspect the dismo model based on the html
browseURL(evaldis@html)
Inspect the many models
maps <- eval1@predictions
plot(maps)
mod_overlap <- calc.niche.overlap(eval1@predictions, overlapStat = "D")
##
|
| | 0%
|
|======================= | 33%
|
|=============================================== | 67%
|
|======================================================================| 100%
kable(mod_overlap) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", font_size = 10)) %>%
scroll_box(width = "100%", height = "200px")
| fc.L_rm.1 | fc.Q_rm.1 | fc.L_rm.2 | fc.Q_rm.2 | |
|---|---|---|---|---|
| fc.L_rm.1 | NA | NA | NA | NA |
| fc.Q_rm.1 | 0.8931787 | NA | NA | NA |
| fc.L_rm.2 | 0.9868645 | 0.8866661 | NA | NA |
| fc.Q_rm.2 | 0.8999516 | 0.9818840 | 0.8946668 | NA |
Identify the best model selecting models with the lowest average test omission rate and the highest average validation AUC
results <- eval.results(eval1)
opt.seq <- results %>%
dplyr::filter(or.10p.avg == min(or.10p.avg)) %>%
dplyr::filter(auc.val.avg == max(auc.val.avg))
kable(opt.seq) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", font_size = 10)) %>%
scroll_box(width = "100%", height = "200px")
| fc | rm | tune.args | auc.train | cbi.train | auc.diff.avg | auc.diff.sd | auc.val.avg | auc.val.sd | cbi.val.avg | cbi.val.sd | or.10p.avg | or.10p.sd | or.mtp.avg | or.mtp.sd | proc_auc_ratio.avg | proc_auc_ratio.sd | proc_pval.avg | proc_pval.sd | AICc | delta.AICc | w.AIC | ncoef |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Q | 2 | fc.Q_rm.2 | 0.8524554 | 0.998 | 0.0855877 | 0.0364185 | 0.8278837 | 0.0763053 | 0.8855 | 0.0469503 | 0.1284263 | 0.0727586 | 0.0041322 | 0.0047715 | 1.916903 | 0.0287362 | 0 | 0 | 11854.65 | 8.744809 | 0.0061021 | 7 |
mod.seq <- eval.models(eval1)[[opt.seq$tune.args]]
kable(mod.seq@results) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", font_size = 10)) %>%
scroll_box(width = "100%", height = "200px")
| X.Training.samples | 482.0000 |
| Regularized.training.gain | 0.8771 |
| Unregularized.training.gain | 0.9226 |
| Iterations | 180.0000 |
| Training.AUC | 0.8366 |
| X.Background.points | 10474.0000 |
| bio_14.contribution | 13.1895 |
| bio_15.contribution | 6.2362 |
| bio_18.contribution | 3.2498 |
| bio_3.contribution | 14.0114 |
| bio_7.contribution | 4.0078 |
| bio_8.contribution | 13.7527 |
| bio_9.contribution | 7.5670 |
| elev.contribution | 37.9857 |
| bio_14.permutation.importance | 0.0000 |
| bio_15.permutation.importance | 37.3287 |
| bio_18.permutation.importance | 1.3948 |
| bio_3.permutation.importance | 11.8152 |
| bio_7.permutation.importance | 39.2240 |
| bio_8.permutation.importance | 1.6956 |
| bio_9.permutation.importance | 8.5417 |
| elev.permutation.importance | 0.0000 |
| Entropy | 8.3787 |
| Prevalence..average.probability.of.presence.over.background.sites. | 0.2383 |
| Fixed.cumulative.value.1.cumulative.threshold | 1.0000 |
| Fixed.cumulative.value.1.Cloglog.threshold | 0.0560 |
| Fixed.cumulative.value.1.area | 0.8887 |
| Fixed.cumulative.value.1.training.omission | 0.0124 |
| Fixed.cumulative.value.5.cumulative.threshold | 5.0000 |
| Fixed.cumulative.value.5.Cloglog.threshold | 0.1056 |
| Fixed.cumulative.value.5.area | 0.6944 |
| Fixed.cumulative.value.5.training.omission | 0.0560 |
| Fixed.cumulative.value.10.cumulative.threshold | 10.0000 |
| Fixed.cumulative.value.10.Cloglog.threshold | 0.1505 |
| Fixed.cumulative.value.10.area | 0.5399 |
| Fixed.cumulative.value.10.training.omission | 0.0975 |
| Minimum.training.presence.cumulative.threshold | 0.0166 |
| Minimum.training.presence.Cloglog.threshold | 0.0154 |
| Minimum.training.presence.area | 0.9931 |
| Minimum.training.presence.training.omission | 0.0000 |
| X10.percentile.training.presence.cumulative.threshold | 10.1387 |
| X10.percentile.training.presence.Cloglog.threshold | 0.1519 |
| X10.percentile.training.presence.area | 0.5364 |
| X10.percentile.training.presence.training.omission | 0.0996 |
| Equal.training.sensitivity.and.specificity.cumulative.threshold | 29.6595 |
| Equal.training.sensitivity.and.specificity.Cloglog.threshold | 0.3221 |
| Equal.training.sensitivity.and.specificity.area | 0.2220 |
| Equal.training.sensitivity.and.specificity.training.omission | 0.2220 |
| Maximum.training.sensitivity.plus.specificity.cumulative.threshold | 39.0791 |
| Maximum.training.sensitivity.plus.specificity.Cloglog.threshold | 0.4156 |
| Maximum.training.sensitivity.plus.specificity.area | 0.1349 |
| Maximum.training.sensitivity.plus.specificity.training.omission | 0.2946 |
| Balance.training.omission..predicted.area.and.threshold.value.cumulative.threshold | 1.8327 |
| Balance.training.omission..predicted.area.and.threshold.value.Cloglog.threshold | 0.0711 |
| Balance.training.omission..predicted.area.and.threshold.value.area | 0.8357 |
| Balance.training.omission..predicted.area.and.threshold.value.training.omission | 0.0145 |
| Equate.entropy.of.thresholded.and.original.distributions.cumulative.threshold | 15.8214 |
| Equate.entropy.of.thresholded.and.original.distributions.Cloglog.threshold | 0.2044 |
| Equate.entropy.of.thresholded.and.original.distributions.area | 0.4156 |
| Equate.entropy.of.thresholded.and.original.distributions.training.omission | 0.1452 |
plot(mod.seq)
dismo::response(mod.seq)
p <- predict(mod.seq, allstack)
# Make into a data frame
p_df <- as.data.frame(p, xy = TRUE)
# Plot
ggplot() +
geom_raster(data = p_df, aes(x = x, y = y, fill = layer)) +
geom_point(data= Galax_urceolata,
mapping = aes(x = long, y = lat),
col='red', cex=0.05) +
coord_quickmap() +
theme_bw() +
scale_fill_gradientn(colours = viridis::viridis(99),
na.value = "black")
save(mod.seq, file = "data/Ecological_Niche_Modeling/enm_output/ENMeval/GalaxENM.rda")
writeRaster(x = p, filename = "data/Ecological_Niche_Modeling/enm_output/ENMeval/GalaxENM.asc",
format = "ascii", NAFlag = "-9999", overwrite = T)
Processing Ecological Niche Models in R.
Script by Anthony Melton and ML Gaynor.
library(raster)
library(gtools)
library(dplyr)
library(ENMTools)
library(ENMeval)
library(ape)
library(RStoolbox)
library(hypervolume)
library(phytools)
The following command generates a binary predicted occurrence map. This was written by Anthony Melton.
source("functions/Functions_AEM.R")
sp1_enm.mx.b <- raster("data/Ecological_Niche_Modeling/enm_output/Galax_urceolata_avg.asc")
sp2_enm.mx.b <- raster("data/Ecological_Niche_Modeling/enm_output/Pyxidanthera_barbulata_avg.asc")
sp3_enm.mx.b <- raster("data/Ecological_Niche_Modeling/enm_output/Pyxidanthera_brevifolia_avg.asc")
sp4_enm.mx.b <- raster("data/Ecological_Niche_Modeling/enm_output/Shortia_galacifolia_avg.asc")
These steps are described in the previous script
system("rm -rf data/climate_processing/PresentLayers/all/maxent.cache/")
list <- list.files("data/climate_processing/PresentLayers/all/",
full.names = TRUE, recursive = FALSE)
list <- mixedsort(sort(list))
allstack <- stack(list)
projection(allstack) <- "+proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs"
Niche breadth is the breadth of environmental factors for a species’ niche, ranging from 0 to 1. When breadth is closer to 1 the more generalist species with wider tolerances. Values closer to 0 indicate a more specialized species. The raster.breadth command in ENMTools measures the smoothness of suitability scores across a projected landscape. The higher the score, the more of the available niche space a species occupies.
sp1_breadth <- ENMTools::raster.breadth(x = sp1_enm.mx.b)
sp1_breadth$B2
## [1] 0.6369572
sp2_breadth <- ENMTools::raster.breadth(x = sp2_enm.mx.b)
sp2_breadth$B2
## [1] 0.1665568
sp3_breadth <- ENMTools::raster.breadth(x = sp3_enm.mx.b)
sp3_breadth$B2
## [1] 0.04380163
sp4_breadth <- ENMTools::raster.breadth(x = sp4_enm.mx.b)
sp4_breadth$B2
## [1] 0.4567
Calculating niche overlap, Schoener’s D, with ENMEval - Schoener’s D ranges from 0 to 1, where 0 represents no similarity between the projections and 1 represents completely identical projections.
enm_stack.b <- stack(sp1_enm.mx.b, sp2_enm.mx.b, sp3_enm.mx.b, sp4_enm.mx.b)
names(enm_stack.b) <- c("Galax urceolata", "Pyxidanthera barbulata", "Pyxidanthera brevifolia","Shortia galacifolia" )
calc.niche.overlap(enm_stack.b, overlapStat = "D")
##
|
| | 0%
|
|======================= | 33%
|
|=============================================== | 67%
|
|======================================================================| 100%
## Galax.urceolata Pyxidanthera.barbulata
## Galax.urceolata NA NA
## Pyxidanthera.barbulata 0.2108513 NA
## Pyxidanthera.brevifolia 0.1287981 0.1614858
## Shortia.galacifolia 0.6447093 0.1228339
## Pyxidanthera.brevifolia Shortia.galacifolia
## Galax.urceolata NA NA
## Pyxidanthera.barbulata NA NA
## Pyxidanthera.brevifolia NA NA
## Shortia.galacifolia 0.03923816 NA
Let’s look at niche overlap over a tree!
tree <- ape::read.tree(file =
"data/Ecological_Niche_Modeling/AOC_Test_Demo/diapensiaceae_subset.tre")
tree <- ape::compute.brlen(phy = tree, method = "grafen")
plot(tree)
tree <- drop.tip(tree, "Cyrilla_racemiflora")
plot(tree)
alldf <- read.csv("data/cleaning_demo/maxent_ready/diapensiaceae_maxentready_20220712.csv")
Galax_urceolata <- dplyr::filter(alldf, name == "Galax urceolata")
Pyxidanthera_barbulata <- dplyr::filter(alldf, name == "Pyxidanthera barbulata")
Pyxidanthera_brevifolia <- dplyr::filter(alldf, name == "Pyxidanthera brevifolia")
Shortia_galacifolia <- dplyr::filter(alldf, name == "Shortia galacifolia")
sp1 <- enmtools.species(species.name = "Galax_urceolata",
presence.points = Galax_urceolata[,3:2])
sp1$range <- background.raster.buffer(sp1$presence.points, 25000, mask = allstack)
sp1$background.points = background.points.buffer(points = sp1$presence.points, radius = 5000, n = 10000, mask = allstack[[1]])
##############################
sp2 <- enmtools.species(species.name = "Pyxidanthera_barbulata",
presence.points = Pyxidanthera_barbulata[,3:2])
sp2$range <- background.raster.buffer(sp2$presence.points, 25000, mask = allstack)
sp2$background.points = background.points.buffer(points = sp2$presence.points, radius = 5000, n = 10000, mask = allstack[[1]])
#############################
sp3 <- enmtools.species(species.name = "Pyxidanthera_brevifolia",
presence.points = Pyxidanthera_brevifolia[,3:2])
sp3$range <- background.raster.buffer(sp3$presence.points, 25000, mask = allstack)
sp3$background.points = background.points.buffer(points = sp3$presence.points, radius = 5000, n = 10000, mask = allstack[[1]])
#############################
sp4 <- enmtools.species(species.name = "Shortia_galacifolia",
presence.points = Shortia_galacifolia[,3:2])
sp4$range <- background.raster.buffer(sp4$presence.points, 25000, mask = allstack)
sp4$background.points = background.points.buffer(points = sp4$presence.points, radius = 5000, n = 10000, mask = allstack[[1]])
clade=enmtools.clade(species = list(sp1, sp2, sp3, sp4), tree = tree)
check.clade(clade)
##
##
## An enmtools.clade object with 4 species
##
## Species names:
## Shortia_galacifolia Pyxidanthera_brevifolia Pyxidanthera_barbulata Galax_urceolata
##
## Tree:
##
## Phylogenetic tree with 4 tips and 3 internal nodes.
##
## Tip labels:
## Shortia_galacifolia, Pyxidanthera_brevifolia, Pyxidanthera_barbulata, Galax_urceolata
##
## Rooted; includes branch lengths.
##
##
## Data Summary:
## <table>
## <thead>
## <tr>
## <th style="text-align:left;"> </th>
## <th style="text-align:left;"> species.names </th>
## <th style="text-align:left;"> in.tree </th>
## <th style="text-align:left;"> presence </th>
## <th style="text-align:left;"> background </th>
## <th style="text-align:left;"> range </th>
## </tr>
## </thead>
## <tbody>
## <tr>
## <td style="text-align:left;"> Shortia_galacifolia </td>
## <td style="text-align:left;"> Shortia_galacifolia </td>
## <td style="text-align:left;"> TRUE </td>
## <td style="text-align:left;"> 205 </td>
## <td style="text-align:left;"> 10000 </td>
## <td style="text-align:left;"> present </td>
## </tr>
## <tr>
## <td style="text-align:left;"> Pyxidanthera_brevifolia </td>
## <td style="text-align:left;"> Pyxidanthera_brevifolia </td>
## <td style="text-align:left;"> TRUE </td>
## <td style="text-align:left;"> 47 </td>
## <td style="text-align:left;"> 10000 </td>
## <td style="text-align:left;"> present </td>
## </tr>
## <tr>
## <td style="text-align:left;"> Pyxidanthera_barbulata </td>
## <td style="text-align:left;"> Pyxidanthera_barbulata </td>
## <td style="text-align:left;"> TRUE </td>
## <td style="text-align:left;"> 332 </td>
## <td style="text-align:left;"> 10000 </td>
## <td style="text-align:left;"> present </td>
## </tr>
## <tr>
## <td style="text-align:left;"> Galax_urceolata </td>
## <td style="text-align:left;"> Galax_urceolata </td>
## <td style="text-align:left;"> TRUE </td>
## <td style="text-align:left;"> 482 </td>
## <td style="text-align:left;"> 10000 </td>
## <td style="text-align:left;"> present </td>
## </tr>
## </tbody>
## </table>
range.aoc <- enmtools.aoc(clade = clade, nreps = 10, overlap.source = "range")
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> = "none")` instead.
## `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> = "none")` instead.
summary(range.aoc)
##
##
## Age-Overlap Correlation test
##
## 10 replicates
##
## p values:
## (Intercept) empirical.df$age
## 0.1363636 0.1363636
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## NULL
glm.aoc <- enmtools.aoc(clade = clade, env = allstack, nreps = 10, overlap.source = "glm")
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> = "none")` instead.
## `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> = "none")` instead.
summary(glm.aoc)
##
##
## Age-Overlap Correlation test
##
## 10 replicates
##
## p values:
## (Intercept) empirical.df$age
## 0.2727273 0.2727273
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## NULL
The results here are pretty meaningless since we’re looking at very few species, but it serves the purpose of the demo. For range AOCs, an intercept >0.5 and negative slope are indicative of sympatric species, while an intercept of <0.5 and a positive slope are indicative on non-sympatric speciation. A low intercept and positive slope for niche overlap would indicate niche divergence. ***
This will generate a binary map with any cell that contains a suitability score greater than or equal to the lowest score with an occurrence point as a presence. There are other ways to set the threshold, including percentiles or training statistics.
sp1.dist <- make.binary.map(model = sp1_enm.mx.b, occ.dat = Galax_urceolata)
sp2.dist <- make.binary.map(model = sp2_enm.mx.b, occ.dat = Pyxidanthera_barbulata)
sp3.dist <- make.binary.map(model = sp3_enm.mx.b, occ.dat = Pyxidanthera_brevifolia)
sp4.dist <- make.binary.map(model = sp4_enm.mx.b, occ.dat = Shortia_galacifolia)
par(mfrow = c(2,2), mar = c(1,1,1,1))
plot(sp1.dist)
plot(sp2.dist)
plot(sp3.dist)
plot(sp4.dist)
Next, let’s work on getting some data from the predicted
distributions! Niche space can be thought of as a multi-dimensional
hypervolume. We’re using climatic data in this case, so we’re measuring
the hypervolume of climatic niche space occupied by these species.
Warning: This takes forever!
sp1.hv <- get_hypervolume(binary_projection = sp1.dist, envt = allstack)
sp2.hv <- get_hypervolume(binary_projection = sp2.dist, envt = allstack)
sp3.hv <- get_hypervolume(binary_projection = sp3.dist, envt = allstack)
sp4.hv <- get_hypervolume(binary_projection = sp4.dist, envt = allstack)
hv_set <- hypervolume_set(hv1 = sp1.hv, hv2 = sp2.hv, check.memory = F)
hypervolume_overlap_statistics(hv_set)
plot(hv_set)
get_volume(hv_set)